One-way analysis of variance: Diagnostics and Exemplar 9.1

DATAX121-23A (HAM) & (SEC) - Introduction to Statistical Methods

Briefly: Assumptions for inference on 3 or more μis

  1. Three or more independent groups
  2. Within group: Independent observations
  3. All groups have a similar measure of spread
  4. Within group: It is approximately Normally distributed

Fitting a one-way ANOVA to data

Counter-intuitively, we often check the third and fourth assumptions after fitting a one-way ANOVA model to the data

The R function, lm(), is “saying” the following equation, the means-only model, is appropriate for the data

\[ \begin{aligned} y_{ij} = \mu_i + \varepsilon_{ij}, &~ \text{where} ~ \varepsilon_{ij} \sim \text{Normal}(0, \sigma_\varepsilon) ~ \text{and} \\ &~ \text{we have up to} ~ k ~ \text{different values of} ~ i \end{aligned} \]

By fitting the model first, we note that the residuals, \(\varepsilon_{ij}\), are what we want to have a similar spread after accounting for the group means, and we want to be Normally distributed

Details on the equation for one-way ANOVA

\[ \begin{aligned} y_{ij} = \mu_i + \varepsilon_{ij}, &~ \text{where} ~ \varepsilon_{ij} \sim \text{Normal}(0, \sigma_\varepsilon) ~ \text{and} \\ &~ \text{we have up to} ~ k ~ \text{different values of} ~ i \end{aligned} \]

where:

  • \(k\) is the total number of groups
  • \(y_{ij}\) is the observed value for the \(j\)th observation in the \(i\)th group
  • \(\mu_i\) is the true mean of the \(i\)th group
  • \(\varepsilon_{ij}\) is the residual value for the \(j\)th observation in the \(i\)th group
  • \(\text{Normal}(0, \sigma_\varepsilon)\) means that we expect the residuals Normally distributed about 0 and have a common standard deviation \(\sigma_\varepsilon\)

The best estimate of the \(\mu_i\)s are the \(\bar{x}_i\)s, where the best estimate of \(\sigma_\varepsilon\) is \(\sqrt{MSR}\)

So for CS 2.2 the equation of the fitted model is:

\(\text{GillRate}_{ij} = \mu_i + \varepsilon_{ij}\), where \(\varepsilon_{ij} \sim \text{Normal}(0, \sigma_\varepsilon)\) and \(i\) can either be Low, Medium, or High

Rule-of-thumb to assess 3.

The ratio of the group with the largest sample standard deviation and the group with the smallest sample standard deviation is less than 2. That is,

\[ \frac{s_\text{Largest}}{s_\text{Smallest}} < 2 \]

# Recall that if you wanted a specific summary statistic for each 
#   group you had to "spell it out" for R
split(respiration.df, ~ Calcium) |>
  lapply(\(x) sd(x$GillRate))
$High
[1] 13.77675

$Low
[1] 16.23481

$Medium
[1] 14.28366

Diagnostic plot to assess 3.

This diagnostic plot is a scatter plot known as the Residuals versus Fitted plot

If the one-way ANOVA (means-only) model is appropriate for the data:

  • The residuals (y-axis) are centred at 0
  • The residuals’ spread does not depend on the fitted value (x-axis)

If the fitted model is a one-way ANOVA, an observation’s fitted value is equal to the sample mean of the group it belongs to

So for CS 2.2 we should expect the fitted values to correspond to 58.17 bpm, 68.5 bpm, and 58.67 bpm for fish in tanks with high, low, and medium calcium levels, respectively

# A diagnostic plot to help evaluate the third assumption
plot(respiration.fit, which = 1)

Histogram of residuals to assess 4.

As per Slides 3–4, it is the residuals that need to be Normally distributed

We should only check this assumption if the third assumption has been satisfied

If the one-way ANOVA (means-only) model is appropriate for the data:

  • The residuals are unimodal—One “peak”
  • The shape of the residuals’ distribution is a “bell-shape”

So for CS 2.2 we may not trust neither the inference made with the F-test nor the multiple comparisons

# A histogram to help evaluate the third assumption
MASS::truehist(respiration.fit$residuals, nbins = 10, 
               xlab = "Residuals", ylab = "Density")
curve(dnorm(x, sd = summary(respiration.fit)$sigma), 
      add = TRUE, lty = 2)

Diagnostic plot to assess 4.

This diagnostic plot is a scatter plot known as the Normal Q-Q plot

For any linear model that assumes the residuals are Normally distributed: The values of observed residuals (y-axis) agree with their theoretical values (x-axis)

  • They need to be approximately exact for “small” datasets (\(n < 20\))
  • The tails of the Normal Q-Q plot can be a kind of distorted for “medium” datasets
    (\(20 \leq n \leq 50\))
  • The tails of the Normal Q-Q plot can be a not heavily distorted for “large” datasets
    (\(n > 50\))
# A diagnostic plot to help evaluate the fourth assumption
plot(respiration.fit, which = 2)

E 9.1: Choir heights

Context

It was of interest to see whether the average height of choir singers differs by voice part. A random sample of 237 singers from a choral society was done to collect data to answer this question.

Variables
Height A number denoting a singer’s height (in cm)
Part A factor denoting a singer’s voice part, either Soprano, Alto, Tenor, or Bass
# Read in the data
choir.df <- read.csv("datasets/choir-heights.csv")

# Take a quick peek
head(choir.df)
  Height    Part
1 162.56 Soprano
2 160.02 Soprano
3 165.10    Alto
4 177.80    Alto
5 175.26   Tenor
6 172.72   Tenor
# Number of observations in each group
split(choir.df, ~ Part) |>
  lapply(\(x) nrow(x))
$Alto
[1] 62

$Bass
[1] 68

$Soprano
[1] 65

$Tenor
[1] 42

Visualising the data

Describe any features of the data

Each voice part is unimodal. (The extra mode for the Altos is an artefact of the histogram, as the height of its’ 170–175 cm bar is two observations!)

Within voice part: The distribution of heights all seem relatively symmetrical, suggesting that each group’s sample mean is a good measure of centre

Notably, on average, the tenors and basses seem taller than the sopranos and altos. This makes sense once you consider men can only be tenors and basses, and women can only be sopranos and altos

histogram( ~ Height | Part, data = choir.df, xlab = "Height (cm)",
          main = "Height of singers by voice part", type = "c")

Figure: The height of the 237 singers by their voice part

Fitting a one-way ANOVA

The equation of the fitted model?

\(\text{Height}_{ij} = \mu_i + \varepsilon_{ij}\), where \(\varepsilon_{ij} \sim \text{Normal}(0, \sigma_\varepsilon)\) and \(i\) can either be Soprano, Alto, Tenor, or Bass

# Fit the means-only model to the data
choir.fit <- lm(Height ~ Part, data = choir.df)

# Produce a diagnostic plot to help evaluate the third assumption
plot(choir.fit, which = 1)

# Produce a diagnostic plot to help evaluate the fourth assumption
plot(choir.fit, which = 2)

Assumptions for inference met?

  1. Three or more independent groups
    A random sample of singers from a choral society was conducted. So, by chance, the sample proportion of each voice part should agree with each part’s corresponding unknown population proportion.
  2. Within group: Independent observations
    Since the first assumption was met because of random sampling, we also met this assumption.
  3. All groups have a similar measure of spread
    The diagnostic plot to help assess this assumption, Residuals vs Fitted, suggests it is fine. Most of the “strange” residuals are from taller observations.
  4. Within group: It is approximately Normally distributed
    The diagnostic plot to help assess this assumption, Normal Q-Q, suggests that once we account for the differences between sample means, the residuals are plausibly Normally distributed as most residuals do not deviate from the dashed line

The F-test

The hypothesis statements for the F-test in the context of E 9.1 are: \[ H_0\!: \mu_S = \mu_A = \mu_T = \mu_B \\ H_1\!: \text{At least one} ~ \mu_i \neq \mu_j \]

Interpret the result of the F-test using a 5% significance level

We have very strong evidence against the null that the population mean heights of Soprano, Alto, Tenors, and Basses are the same, in favour of the alternative, that there is at least one difference between in the population mean heights of two out of the four voice parts (p-value ≈ 0).

# The table that summarises how the "variability" of the 
#   heights is explained by the voice part and what's leftover
anova(choir.fit)
Analysis of Variance Table

Response: Height
           Df  Sum Sq Mean Sq F value    Pr(>F)    
Part        3 12520.9  4173.6  100.97 < 2.2e-16 ***
Residuals 233  9631.5    41.3                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple comparisons

Which pairwise comparisons between means are significant at the 5% level?

All of them according to the 95% confidence intervals and the adjusted p-values!

With 95% confidence, we estimate that:

  • Altos are, on average, somewhere between 11.0 and 23.0 cm shorter than basses
  • Altos are, on average, somewhere between 0.3 and 6.2 cm taller than sopranos
  • Altos are, on average, somewhere between 6.9 and 13.5 cm shorter than tenors
  • Basses are, on average, somewhere between 14.3 and 20.1 cm taller than sopranos
# Load the emmeans R package
library(emmeans)

# Produce the pairwise confidence intervals and adjust
#   them for multiple comparisons
emmeans(choir.fit, ~ Part) |>
  pairs(infer = TRUE)
 contrast        estimate   SE  df lower.CL upper.CL t.ratio p.value
 Alto - Bass       -13.92 1.13 233  -16.842   -11.00 -12.330  <.0001
 Alto - Soprano      3.25 1.14 233    0.296     6.20   2.847  0.0246
 Alto - Tenor      -10.20 1.28 233  -13.530    -6.88  -7.942  <.0001
 Bass - Soprano     17.17 1.12 233   14.284    20.06  15.396  <.0001
 Bass - Tenor        3.72 1.26 233    0.451     6.98   2.945  0.0186
 Soprano - Tenor   -13.45 1.27 233  -16.748   -10.16 -10.570  <.0001

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates 
P value adjustment: tukey method for comparing a family of 4 estimates